Auto-Grouping Emails For Faster E-Discovery
نویسندگان
چکیده
In this paper, we examine the application of various grouping techniques to help improve the efficiency and reduce the costs involved in an electronic discovery process. Specifically, we create coherent groups of email documents which characterize either a syntactic theme, a semantic theme or an email thread. All such grouped documents can be reviewed together leading to a faster and more consistent review of documents. Syntactic grouping of emails is based on near duplicate detection whereas semantic grouping is based on identifying concepts in the email content using information extraction. Email thread detection is achieved using a combination of segmentation and near duplicate detection. We present experimental results on the Enron corpus that suggest that these approaches can significantly reduce the review time and show that high precision and recall in identifying the groups can be achieved. We also describe how these techniques are integrated into the IBM eDiscovery Analyzer product offering.
منابع مشابه
Template Induction over Unstructured Email Corpora
Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a techni...
متن کاملReducing E-Discovery Cost by Filtering Included Emails
As business activities becoming more digitalized, electronic information is often produced as vital evidence during civil litigation. The process of discovering information as evidence is getting increasingly expensive as the volume of data explodes. This surging demand calls for a solution to reduce the cost associated with discovery. In this paper, we propose filtering included emails as a me...
متن کاملEmail Grouping Method
In this paper we presents a neural network based system for automated email grouping into activities found in the email messageEmail Grouping Method (EGM). Email users spend a lot of time reading, replying and organizing their emails and this seems to be time consuming and sometimes can resolves to less performance of daily duty, and un-necessary distractions. A new system that can manage mails...
متن کاملDetection Phishing Emails Using Features Decisive Values
Phishing emails are messages designed to fool the recipient into handing over personal information, such as login names, passwords, credit card numbers, account credentials, social security numbers etc. Fraudulent emails harm their victims through loss of funds and identity theft. They also hurt Internet business, because people lose their trust in Internet transactions for fear that they will ...
متن کاملTowards generic application auto-discovery
The increasing complexity of enterprise applications, the expanding number of networked machines, and the rapid deployment of Internet-based business applications (e-commerce), emphasize the importance and value of application management. One of the main problems in current application management products is the amount of time and effort needed to install and customize them. Application auto-di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 4 شماره
صفحات -
تاریخ انتشار 2011